Irrigation Data

This is a test for R Markdown file kitted to an HTML file.Followed by steps of tidying data, analyzing and plotting eventually.

To print a nice formatted table of your dataset, you
can use kable package in R:

install.packages("KableExtra")

As this table below

year Africa Europe N..America S..America
1980 9.3 18.8 21.2 12.7
1990 11.0 25.3 21.6 15.5
2000 13.2 26.7 23.3 17.3
2007 13.6 26.3 23.8 17.3

Analysing The Data

1. What Represents?

The dataset shows continents and a year as variables, without knowing much about the dataset you could guess that these observations represent a rate, the title of the dataset is irrigation…so it could be that these are the rates of irrigation for continents over years.

1.1 Is it a Tidy Dataset?

If we tried to access these data using a filter/index, for instance of the year 1990 we want to know the ratio of irrigation in Africe, would that be possible?

Here using an index with logical operators.

irrig2$Africa[irrig2$year=="1990"]
## [1] 11

With all other continents of the same year 1990.

irrig2[irrig2=="1990",]
##   year Africa Europe N..America S..America
## 2 1990     11   25.3       21.6       15.5

With using a filter.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
irrig2 %>% filter(irrig2$year=="1990") %>% select("Africa")
##   Africa
## 1     11

So what could be the reason for making our data tidy?
It could minimize the effort of looking up for an observation in a table and also to have a better plotting for the data.

1.2 Making it Tidy!

The dataset has their continents as variables!
instead of filtering out each column to look up for an observation, we need to create a category of continents.

Pivot Longer

library(tidyr)
irrig<-irrig2 %>% pivot_longer( !year, names_to="continent", values_to ="rate")

1.3 Analysing the Data

We’d like to know of which year has the minimum irrigation rate:

irrig$year[min(irrig$rate)]
## [1] 2000

Which cotinent?

irrig$continent[min(irrig$rate)]
## [1] "Africa"

What was the rate?

min(irrig$rate)
## [1] 9.3

We can also use the summary function to have all the descriptive statitics showing instead of querying it one by one :

irrig %>% group_by(continent) %>% summarise(min=min(rate), max=max(rate), mean=mean(rate), sd=sd(rate))
## # A tibble: 4 x 5
##   continent    min   max  mean    sd
##   <chr>      <dbl> <dbl> <dbl> <dbl>
## 1 Africa       9.3  13.6  11.8  2.01
## 2 Europe      18.8  26.7  24.3  3.70
## 3 N..America  21.2  23.8  22.5  1.27
## 4 S..America  12.7  17.3  15.7  2.17

Plotting

2. Introducing Plotly Package in R

Installing the package:

install.packages("plotly")

The difference between ggplot2 and plotly packages:

Plotly can be used in Dashboards to be represented to other people, while ggplot2 is more of data analysis, with plotly you can interact more with plots and its common between R and Python

2.1 What do we need to plot?

Our interest is to find out about “Africa” since it has the lowest irrigation rate among all other continent and we’d like to draw a plot represented through the years!

2.2 Plotting all the data

But first lets see all the data:

library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
fig<-plot_ly(data=irrig, x=~year,y=~rate,
             marker = list(size = 10,
                           color = 'pink',
                           line = list(color = 'black',
                                       width = 1)))
m <- list(
  l = 50,
  r = 50,
  b = 100,
  t = 100,
  pad = 4
)
fig <- fig %>% layout(autosize = F, width = 500, height = 500, margin=m)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
fig
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

2.3 Plotting with a line

Now we need to add continents as well to understand the plot more:

library(plotly)
fig<-plot_ly(data=irrig, x=~year,y=~rate,color = ~continent,
             mode='lines',marker = list(size = 10,
                           color = 'pink',
                           line = list(color = 'black',
                                       width = 1)))
m <- list(
  l = 50,
  r = 50,
  b = 100,
  t = 100,
  pad = 4
)
fig <- fig %>% layout(autosize = F, width = 500, height = 500, margin=m)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
fig
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

2.4 Conclusion

From what we can see in the plot that North America and Africa were getting higher rates by 2007, while South America and Eroupe were decreasing.

End